Consolidated Tree Construction Algorithm: Structurally Steady Trees
نویسندگان
چکیده
This paper presents a new methodology for building decision trees or classification trees (Consolidated Trees Construction algorithm) that faces up the problem of unsteadiness appearing in the paradigm when small variations in the training set happen. As a consequence, the understanding of the made classification is not lost, making this technique different from techniques such as bagging and boosting where the explanatory feature of the classification disappears. The presented methodology consists on a new metaalgorithm for building structurally more steady and less complex trees (consolidated trees), so that they maintain the explaining capacity and they are faster, but, without losing the discriminating capacity. The meta-algorithm uses C4.5 as base classifier. Besides the meta-algorithm, we propose a measure of the structural diversity used to analyse the stability of the structural component. This measure gives an estimation of the heterogeneity in a set of trees from the structural point of view. The obtained results have been compared with the ones get with C4.5 in some UCI Repository databases and a real application of customer fidelisation from a company of electrical appliances.
منابع مشابه
Consolidated Trees: An Analysis of Structural Convergence
When different subsamples of the same data set are used to induce classification trees, the structure of the built classifiers is very different. The stability of the structure of the tree is of capital importance in many domains, such as illness diagnosis, fraud detection in different fields, customer’s behaviour analysis (marketing), etc, where comprehensibility of the classifier is necessary...
متن کاملBehavior of Consolidated Trees when using Resampling Techniques
Many machine learning areas use subsampling techniques with different objectives: reducing the size of the training set, equilibrate the class imbalance or non-uniform cost error, etc. Subsampling affects severely to the behavior of classification algorithms. Decision trees induced from different subsamples of the same data set are very different in accuracy and structure. This affects the expl...
متن کاملA new algorithm to build consolidated trees: study of the error rate and steadiness
This paper presents a new methodology for building decision trees, Consolidated Trees Construction algorithm, that improves the behavior of C4.5. It reduces the error and the complexity of the induced trees, being the differences in the complexity statistically significant. The advantage of this methodology in respect to other techniques such as bagging, boosting, etc. is that the final classif...
متن کاملThe Effect of the Used Resampling Technique and Number of Samples in Consolidated Trees’ Construction Algorithm
In many pattern recognition problems, the explanation of the made classification becomes as important as the good performance of the classifier related to its discriminating capacity. For this kind of problems we can use Consolidated Trees ́ Construction (CTC) algorithm which uses several subsamples to build a single tree. This paper presents a wide analysis of the behavior of CTC algorithm for ...
متن کاملCTC: An Alternative to Extract Explanation from Bagging
Being aware of the importance of classifiers to be comprehensible when using machine learning to solve real world problems, bagging needs a way to be explained. This work compares Consolidated Tree’s Construction (CTC) algorithm with the Combined Multiple Models (CMM) method proposed by Domingos when used to extract explanation of the classification made by bagging. The comparison has been done...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004